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The standard noise model in gravitational wave (GW) data analysis assumes detector noise is 
stationary and Gaussian distributed, with a known power spectral density (PSD) that is usually 
estimated using clean off-source data. Real GW data often depart from these assumptions, and 
misspecified parametric models of the PSD could result in misleading inferences. We propose a 
Bayesian semiparametric approach to improve this. We use a nonparametric Bernstein polynomial 
prior on the PSD, with weights attained via a Dirichlet process distribution, and update this using 
the Whittle likelihood. Posterior samples are obtained using a blocked Metropolis-within-Gibbs 
sampler. We simultaneously estimate the reconstruction parameters of a rotating core collapse 
supernova GW burst that has been embedded in simulated Advanced LIGO noise. We also discuss 
an approach to deal with non-stationary data by breaking longer data streams into smaller and 
locally stationary components. 
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I. INTRODUCTION 

Astronomy is entering a new and exciting era, with 
the second generation of ground-based gravitational wave 
(GW) interferometers (Advanced LIGO [T], Advanced 
Virgo [2] , and KAGRA m) expected to reach design sen¬ 
sitivity in the next few years. Throughout history, de¬ 
velopments in astronomy have led to deeper understand¬ 
ings of the universe. Each time we probe the universe 
with new sensors, we discover exciting and unexpected 
phenomena, that challenge our current beliefs in astro¬ 
physics and cosmology. GW astronomy promises to do 
the same, providing a new set of ears to listen to (poten¬ 
tially unanticipated) cataclysmic events in the cosmos. 

Apart from the first direct observation of GWs, ex¬ 
tracting astrophysical information encoded in GW sig¬ 
nals is one of the primary goals in GW data analysis. 
Since observations are subject to noise, accurate astro- 
physical predictions rely on an honest characterization 
of these noise sources. At its design sensitivity. Ad¬ 
vanced LIGO will be sensitive to GWs in the frequency 
band from 10 Hz to 8 kHz. The main noise sources for 
ground-based interferometers include seismic noise, ther¬ 
mal noise, and photon shot (quantum) noise [1] . Seismic 
noise limits the low frequency sensitivity of the detec¬ 
tors. Thermal noise is the predominate noise source in 
the most sensitive frequency band of Advanced LIGO 
(around 100 Hz), and arises from the test mass mirror 
suspensions and the Brownian motion of the mirror coat¬ 
ings. Photon shot noise is due to quantum uncertainties 
in the detected photon arrival rate, and dominates the 
high frequency sensitivity of the detectors. 
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Standard assumptions about the noise model in the 
GW data analysis community rely on detector noise be¬ 
ing stationary and Gaussian distributed, with a known 
power spectral density (PSD) that is usually estimated 
using off-source data (not on a candidate signal) [i] . Real 
GW data often depart from these assumptions [5]. It 
was demonstrated in [5] that fluctuations in the PSD can 
moderately bias parameter estimates of compact binary 
coalescence GW signals embedded in LIGO data from 
the sixth science run (S6). 

High amplitude non-Gaussian transients (or 
“glitches”) in real detector data invalidate the Gaussian 
noise assumption, and misspecifications of the paramet¬ 
ric noise model could result in misleading inferences 
and predictions. A more sophisticated approach would 
be to make no assumptions about the underlying noise 
distribution by using nonparametric techniques. Unlike 
parametric statistical models, which have a fixed and 
finite set of parameters (e.g., the Gaussian distribution 
has two parameters: pi and representing the mean 
and variance respectively), nonparametric models have a 
potentially infinite set of parameters, allowing for much 
greater flexibility. 

The theory of spectral density estimation requires a 
time series to be a stationary process. If data is not sta¬ 
tionary (which is often the case for real LIGO data), it 
is important to adjust for this by introducing a time- 
varying PSD. It was demonstrated in [7] that the noise 
PSD in real S6 LIGO data is in fact time-varying. Vari¬ 
ation in detector sensitivity was also shown in [8] . Other 
GW literature that discusses non-stationary noise include 
mm- It would be an over-simplification to assume the 
Advanced LIGO PSD is constant over time, and to use 
off-source data in characterizing this. On-source estima¬ 
tion of the PSD would therefore be preferable to mitigate 
the time-varying nature of the PSD. 

There have been attempts reported in the literature 
to improve the modelling of noise present in GW data. 
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primarily concentrating on noise with embedded signals 
from well-modelled GW sources, such as binary inspi¬ 
rals HEIHIl], and more recently from GW bursts (un¬ 
modelled and typically short duration events) [71 (TS]. 

Under the Bayesian framework, Rover et al [a used 
a Student-t likelihood as a generalization to the com¬ 
monly used Whittle (approximate Gaussian) likelihood 
flB] , The beneht of the Student-t set-up is two-fold: un¬ 
certainty in the noise spectrum can be accounted for via 
marginalization of nuisance parameters; and outliers can 
be accommodated due to the heavy-tail nature of the 
Student-t probability density. A drawback of this method 
is that the choice of hyperparameters can unduly influ¬ 
ence posterior inferences. 

Using the maximum likelihood approach. Rover |12j 
later demonstrated that the Student-t likelihood could 
be used as a generalization to the matched-filtering 
detection method commonly used in the analysis of 
GW signals from well-modelled sources. This approach 
would not be appropriate for GW bursts, since matched- 
filtering requires accurate signal models with well-dehned 
parameter spaces. 

Littenberg and Gornish [13] used Bayesian model se¬ 
lection to determine the best noise likelihood function 
in non-Gaussian noise. They considered Gaussian noise 
with a time varying mean, noise from a weighted sum of 
two Gaussian distributions (non-Gaussian tails), and a 
combination of Gaussian noise and glitches (modelled as 
a linear combination of wavelets). 

Littenberg et al [3] demonstrated how one can incor¬ 
porate additional scale parameters in the Gaussian likeli¬ 
hood, and marginalize over the uncertainty in the PSD to 
reduce systematic biases in parameter estimates of com¬ 
pact binary mergers in S5 LIGO data. This method re¬ 
quires an initial estimate of the PSD. On a related note, 
Vitale et al [14] highlighted a Bayesian method, similar 
to iteratively reweighted least squares, that analytically 
marginalizes out background noise and requires no a pri¬ 
ori knowledge of the PSD. They applied this to simulated 
data from LISA Pathhnder. 

More recently, Littenberg and Cornish 0 introduced 
the BayesLine algorithm in conjunction with BayesWave 
to estimate the underlying PSD of GW detector 
noise. BayesLine is used to model the Gaussian noise 
component. They use a cubic spline to model the smooth 
changing broadband noise and Lorentzians (Cauchy den¬ 
sities) to model wandering spectral lines (due to AC sup¬ 
ply, violin modes, etc.). BayesWave, on the other hand, 
models the non-Gaussian instrument “glitches” and burst 
sources with a continuous wavelet basis. Both meth¬ 
ods make use of the trans-dimensional reversible jump 
Markov chain Monte Carlo (RJMCMC) algorithm of 
Green m- BayesLine is very pragmatic and works re¬ 
markably well on real Advanced LIGO data. However, 
the authors did not consider statistically important no¬ 
tions such as the posterior consistency of the PSD m- 

Our approach to improving the GW noise model re¬ 
lies on developments over the past decade in the area 


of Bayesian nonparametrics. Since parametric modelling 
can lead to biased estimates when the underlying para¬ 
metric assumptions are invalid, we prefer nonparametric 
techniques to estimate the PSD of a stationary noise time 
series. 

A common nonparametric estimate of the spectral den¬ 
sity of a stationary time series is the periodogram, calcu¬ 
lated using the (normalized) squared modulus of Fourier 
coefficients. That is. 
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Ae(-7r,7r], (1) 


where A is the frequency, and {Xt} is a stationary time 
series, where t = l,2,...,n represents discretized time. 
The periodogram randomly fluctuates about the true 
spectral density of a time series, but is not a consis¬ 
tent estimator, motivating methods such as periodogram 
smoothing and averaging m- Averaging of off-source 
periodograms from Tukey windowed simulated Advanced 
LIGO noise has been used in GW literature relating 
to reconstructing rotating core collapse GWs [7D], and 
predicting the important astrophysical parameters from 
these events [2T]. 

In this paper, we implement the nonparamet¬ 
ric Bayesian spectral density estimation method and 
Metropolis-within-Gibbs Markov chain Monte Garlo 
(MGMG) sampler presented by Choudhuri et al [22], 
which updates a nonparametric Bernstein polynomial 
prior [731 m] on the spectral density using the Whit¬ 
tle likelihood to make posterior inferences. A Bernstein 
polynomial prior is essentially a hnite mixture of Beta 
probability densities (see Section II G and Appendix A). 
It was proved that this method yields a consistent estima¬ 
tor for the true spectral density of a (short-term memory) 
stationary time series [22] — an attractive feature, absent 
in the periodogram. Posterior consistency in this context 
essentially means that the posterior probability of an ar¬ 
bitrary neighbourhood around the true PSD goes to 1 as 
the length of the time series increases to infinity. Thus, 
as the sample size increases, the posterior distribution of 
the PSD will eventually concentrate in a neighbourhood 
of the true PSD [18]. This is an important asymptotic 
robustness quality of the posterior distribution in that 
the choice of prior parameters should not influence the 
posterior distribution too much. Especially in Bayesian 
nonparametrics, because of the high dimension of the pa¬ 
rameter space, many posterior distributions do not auto¬ 
matically possess this quality [13]. We refer the reader 
to Appendix G for a visual demonstration of posterior 
consistency. 

Unlike references [U [TTl [14], we do not treat noise as 
a nuisance parameter to be analytically integrated out. 
Although the signal parameters are our primary concern, 
we are also interested in quantifying our uncertainty in 
the underlying PSD of the noise in terms of posterior 
probabilities and credible intervals. Knowledge of this 
uncertainty will allow us to make honest astrophysical 
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statements. 

In this study, we assume that data is the sum of a GW 
signal embedded in noise (from all noise sources), such 
that 

y = s(/3) + e(0), (2) 

where y is the (coincident) time-domain GW data vector, 
s is a GW signal parameterized by (3, and e is noise 
parameterized by 6. Notation with a tilde on top, such 
as y, refers to the frequency-domain equivalent of the 
same quantity, obtained by the discrete Fourier transform 
(DFT). Note that we are treating noise in this set-up as 
the conglomeration of detector noise (such as thermal 
noise and photon shot noise), background noise (such 
as seismic noise), and residual errors due to parametric 
statistical modelling of GW signals. An important caveat 
is to ensure the magnitude of the errors in the statistical 
model of the signal are minimized, so as to not artificially 
dominate the noise. Estimation of spectral lines (as done 
by the BayesLine algorithm H) is left out of the scope 
of this paper. 

The GW signal could essentially come from any source, 
but in this paper we will restrict our concentration to 
those from rotating core collapse supernovae to simplify 
the problem and demonstrate the power of the method. 
Using the recent waveform catalogue of Abdikamalov et 
al [25], we conduct principal component analysis (PCA), 
and fit a principal component regression (PGR) model of 
the most important principal components (PCs) on an 
arbitrary rotating core collapse GW signal [20l |2T1126| . 
The (parametric) signal component is easily embedded as 
an additional Gibbs step in the Metropolis-within-Gibbs 
MCMC sampler of Choudhuri et al |52|. That is, we 
utilize a blocked Gibbs approach to sequentially sample 
the signal parameters (3 given the noise parameters 6, 
and vice versa. As the model now contains a paramet¬ 
ric signal component as well as a nonparametric noise 
component, it is “semiparametric”. 

To accommodate for non-stationary noise, we adapt 
an idea presented by Rosen et al HZj, and assume a non- 
stationary time series can be broken down into smaller 
locally stationary segments. For each segment, we sepa¬ 
rately estimate the PSD using the method of Choudhuri 
et al [22] . and look at the time-varying spectrum. 

We see this work as being a complement to existing 
methods, with the following benefits: 

• A Bayesian framework, allowing us to update prior 
knowledge based on observed data, as well as quan¬ 
tify uncertainty in terms of probabilistic state¬ 
ments; 

• Posterior consistency of the PSD, i.e., the posterior 
distribution will concentrate around the true PSD 
as the sample size increases; 

• No parametric assumptions about the underlying 
noise distribution (parametric models are very sen¬ 
sitive to misspecifications), and high amplitude 


non-Gaussian transients in the noise can be han¬ 
dled; 

• Non-stationarities can be taken into account by 
splitting the data into smaller locally stationary 
segments; 

• Estimation of noise and signal parameters are done 
simultaneously using Gibbs sampling; 

• Uncertainty in astrophysically meaningful parame¬ 
ter estimates are honest, with less systematic bias 
present; 

• Non-informative priors can be chosen, and the PSD 
does not need to be known a priori; 

• Useful for any signal with a parametric statistical 
model (including rotating core collapse supernova 
GWs). 

The paper is structured as follows: Section II outlines 
the methods and models used to simultaneously esti¬ 
mate signal and noise parameters in GW data; results 
for toy models and simulated Advanced LIGO data are 
presented in Section III; and in Section IV, we discuss 
the consequences of this work, as well as future initia¬ 
tives. Supplementary material can be found in the three 
appendices. 

II. METHODS AND MODELS 

A. Parametric, Nonparametric, and 
Semiparametric Models 

Statistical models can be classified into two groups — 
parametric and nonparametric. Parametric models have 
a fixed and finite set of parameters, are relatively easy 
to analyze, and are powerful when their underlying as¬ 
sumptions are correctly specified. However, if the model 
is misspecified, inferences will be unreliable. Nonpara¬ 
metric models have far fewer restrictions, but are less ef¬ 
ficient and powerful than their parametric counterparts. 
No assumption about the underlying distribution of the 
data is made in nonparametric modelling, and the num¬ 
ber of parameters are not fixed (and potentially infinite 
dimensional). Instead, the effective number of parame¬ 
ters increases with more data, providing the model struc¬ 
ture. 

For example, parametric regression (including linear 
models, nonlinear models, and generalized linear models) 
uses the following equation: 

y = 5(xi,X2,...,Xk|/3)-ke, (3) 

where y is the response variable, g(xi,X 2 ,..., Xk|/3) is a 
function of k explanatory variables (that aim to explain 
the variability in y) given some model parameters ( 3 , and 
e is the statistical error, usually assumed to be indepen¬ 
dent and identically distributed (iid) Gaussian random 
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variables, with 0 mean and constant variance cr^. Here, 
the functional form of g{.) is known in advance, such as 
in linear regression, where we have 


5 ( xi , x 2 , ... ,Xk|/3) = /3o + iSi^i + ■ ■ ■ + ;5feXk. (4) 

Nonparametric regression has a similar set-up, but as¬ 
sumes that the functional form oi g{.) is unknown and to 
be learnt from the data. Function g{.) could be thought 
of as an uncountably inhnite-dimensional parameter in a 
nonparametric setting. 

Semiparametric models contain both parametric and 
nonparametric components. The parametric regression 
model presented in Equations ^ and Q is essentially 
the same parametric model used in this paper for GW 
signal reconstruction, where (xi,X 2 ,... ,Xk) are princi¬ 
pal component (PC) basis functions. However, we model 
the noise e nonparametrically, rather than assuming iid 
Gaussian noise. Since we have parametric and nonpara¬ 
metric components, our model is semiparametric in na¬ 
ture. 


B. Bayesian Nonparametrics 

Bayesian nonparametrics contains the set of models 
on the interface between the Bayesian framework and 
nonparametric statistics, and is characterized by large 
parameter spaces and probability measures over these 
spaces [18]. The Bayesian statistical framework is use¬ 
ful for incorporating prior knowledge, and is particularly 
powerful when these priors accurately represent our be¬ 
liefs. As mentioned in the previous section, nonparamet¬ 
ric methods are useful for constructing flexible and robust 
alternatives to parametric models. A beneht of Bayesian 
nonparametric models is that they automatically infer 
model complexity from the data, without explicitly con¬ 
ducting model comparison. 

Bayesian nonparametrics is a relatively nascent field 
in statistics, and faces many challenges. The most obvi¬ 
ous one is the mathematical difficulty in specifying well- 
defined probability distributions on infinite-dimensional 
function spaces. Gonstructing a prior on these spaces 
can be arduous, and in the case of non-informative pri¬ 
ors, one should ensure large topological support so as not 
to put too much mass on a small region. Further, cre¬ 
ating computationally convenient algorithms to sample 
from complicated posterior distributions presents its own 
set of challenges. It is also important to ensure that a 
Bayesian nonparametric model is statistically consistent 
(the truth is uncovered asymptotically), as some proce¬ 
dures do not automatically possess this quality Hi. 

Bayesian nonparametric priors (and posteriors) are 
stochastic processes rather than parametric distributions. 
Ferguson Hi provided the seminal paper for the field of 
Bayesian nonparametrics, introducing the Dirichlet pro¬ 
cess, an infinite-dimensional generalization of the Dirich¬ 
let distribution, now commonly used as a prior in infinite 
mixture models. This is a popular model (often called 


the Chinese Restaurant Process) for classihcation prob¬ 
lems where the number of classes is unknown and to be 
inferred from the data. A formal definition of the Dirich¬ 
let distribution and Dirichlet process can be found in 
Appendix B. 

Another popular prior in Bayesian nonparametrics is 
the Gaussian process prior, which is often used in nonlin¬ 
ear regression contexts. In fact, one could extend the re¬ 
gression example in the previous section into the realm of 
Bayesian nonparametrics by putting a Gaussian process 
prior on the function g. Compare this to the Bayesian 
parametric counterpart, which puts a prior on the model 
parameters /3. 

For further discussion on Bayesian nonparametrics, we 
refer the reader to HHj. 


C. Spectral Density Estimation 

A weakly (or second order) stationary time series {Xt} 
is a stochastic process that has constant and finite mean 
and variance over time, and an autocovariance function 
7 (/i) that depends only on the time lag h. That is, for a 
zero-mean weakly stationary process, the autocovariance 
function has the form 


-fih)=E[XtXt+h], Vt, (5) 

where E[.] is the expected value operator, and t represents 
time. 

Assuming an absolutely summable autocovariance 
function {J2^=-ao l7(^)l < °°)> the (real-valued) spectral 
density function /(A) of a zero-mean weakly stationary 
time series is defined as 

. OO 

/(A) = — 7 (/i)exp(-ihA), Ae(-7r,7r], (6) 

h— — oo 

where A is the angular frequency. Note that the spec¬ 
tral density function and autocovariance function are a 
Fourier transform pair. In this paper, we will also call 
this the power spectral density (PSD) function, although 
this term is sometimes reserved for the empirical spec¬ 
trum (periodogram) in the GW literature. 

For a mean-centered weakly stationary time series 
{Xt} of length n, with spectral density /(A), the Whittle 
approximation to the Gaussian likelihood, or simply the 
Whittle likelihood US] is defined as 

L„(x|/) cx exp (^log/(Ai) + j , (7) 


where A/ = ‘I'Kljn are the positive Fourier frequencies, 
u = (n—1)/2, [mJ is the greatest integer value less than or 
equal to u, and /„(.) is the periodogram defined in Equa¬ 
tion (11. If the PSD is known, the log / term in Equa¬ 
tion (71 is a constant and can be ignored. The Whittle 
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likelihood has an advantage over the true Gaussian like¬ 
lihood as it has a direct dependence on the PSD rather 
than the autocovariance function. The Whittle likeli¬ 
hood is only exact for Gaussian white noise but works 
well under certain conditions, even when the data is not 
Gaussian [29]. More information about these concepts 
can be found in any good time series analysis textbook, 
such as Brockwell and Davis [50] . 

We now need to specify a nonparametric prior for the 
PSD. We will briefly introduce the spectral density esti¬ 
mation technique of Ghoudhuri et al |22j , which is based 
on the Bernstein polynomial prior of Petrone [231 1^ - 
The Bernstein polynomial prior is a nonparametric prior 
for a probability density on [0, 1], and is based on the 
Weierstrass approximation theorem that states that any 
continuous function on [0, 1] can be uniformly approxi¬ 
mated to any desired degree by a Bernstein polynomial. 
If this function is a density on [0, 1], this Bernstein poly¬ 
nomial is essentially a finite mixture of Beta densities. 
We refer the reader to Appendix A for a definition of 
the Bernstein polynomial and Beta density. Instead of 
putting a Dirichlet prior on the mixture weight vector, 
the weights are defined via a probability distribution G 
on [0, 1] and a Dirichlet process prior is put on the space 
of probability distributions on [0, 1]. Appendix B con¬ 
tains supplementary material on the Dirichlet process. 

Since the spectral density is not defined on the unit 
interval, we reparameterize /(A), such that 

/(ttw) = rg(w), wG[0, 1], (8) 

where r = /(7ra;)daj is the normalization constant. 
To specify a prior on spectral density /(ttw), we put a 
Bernstein polynomial prior on q^uj), using the following 
hierarchical scheme: 

• 9(w) = G (^, |] P{u}\j, k-j + 1), where G 
is a cumulative distribution function, and /3(a;|a, b) 
is a Beta probability density with parameters a and 

b. 

• G is a Dirichlet process distributed random proba¬ 
bility measure with base measure Go and precision 
parameter M. 

• k has a discrete probability mass function such that 
p{k) (X exp(—fc = 1, 2,.... 

• T has an Inverse-Gamma(Q;T,/3 t) distribution. 

• G, fc, and t are a priori independent. 

We use the stick-breaking construction of the Dirich¬ 
let process by Sethuraman m, which is an infinite¬ 
dimensional mixture model (defined in Appendix B). For 
computational purposes, we need to truncate the num¬ 
ber of mixture distributions to a large but finite number 
L. The choice of a large L will provide a more accurate 
approximation but also increase the computation time. 


Here, we choose L = max{20, We therefore repa¬ 

rameterize G to {Zq, Zi,..., Zl, Vi, ..., Vl) such that 

G = -I- (9) 

where pi = Vi, pi = (11^=1 (1 “ E)) ^ ^ 2, 

Vl ~ Beta(l,M) for ^ = 1,...,L, and Zi ^ Gq for 
I = 0,1,...,L. Note that Sa is a probability density, 
degenerate at a. That is, = 1 at a and 0 otherwise. 
This yields the prior mixture of the PSD 

k 

/(ttw) = r^Wj-^fe/3(w|j,fc - j -b 1), (10) 

with weights - ii Po = 

^-EtiPi- 

Abbreviating the vector of noise parameters as 6 = 
(v, z, fc, r), the joint prior is therefore 

p{e) (X ^nM(l - ^n5o(2z)^ p(fc)p(T): 

(11) 

and is updated using the Whittle likelihood to produce 
the unnormalized joint posterior. 

This method is implemented as a Metropolis-within- 
Gibbs MCMG sampler. In Ghoudhuri et al |32|, param¬ 
eters fc and r are readily sampled from their full condi¬ 
tional posteriors, while V and Z require the Metropolis 
algorithm with Uniform proposals. Our only variation 
on this implementation is our sampling of the smooth¬ 
ness parameter fc. We found that a Metropolis step is 
faster than sampling from the full conditional. The orig¬ 
inal implementation contains a for() loop that evaluates 
the log posterior fcmax number of times, where fcmax is 
chosen (during pilot runs) to be large enough to cater for 
the roughness of the PSD. For most well-behaved cases, 
fcmax = 50 will suffice, but the Advanced LIGO PSD re¬ 
quires many more mixture distributions (by one to two 
orders of magnitude) due its steepness at low frequen¬ 
cies. This is a significant computational burden, and a 
well-tuned Metropolis step can therefore outperform the 
original implementation. 

A discussion of the Dirichlet process and stick-breaking 
representation can be found in Appendix B. 


D. Signal Reconstruction 

To reconstruct a rotating core collapse GW signal that 
is embedded in noise, we use the (parametric) principal 
component regression (PGR) method described in (501 
EUESj. That is, let 

y = X/3 -f e. 


( 12 ) 
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where y is the length n frequency-domain GW data vec¬ 
tor, X is the n X d matrix of the d frequency-domain 
principal component basis vectors, /3 is the vector of sig¬ 
nal reconstruction parameters (PC coefficients), and e is 
the frequency-domain noise vector with a known PSD. 
We assume flat priors on f3. It is important to highlight 
that useful astrophysical information (such as the ratio of 
kinetic to gravitational potential energy of the inner core 
at bounce, and precollapse differential rotation) can be 
extracted by regressing the posterior means of the PC co¬ 
efficients (3 on the known astrophysical parameters from 
the waveform catalogue, and sampling from the posterior 
predictive distribution m- 

We include an additional Gibbs step in the MCMC 
sampler described in the previous section to simultane¬ 
ously reconstruct a rotating core collapse GW signal, 
whilst also estimating the noise power spectrum. Omit¬ 
ting the conditioning on the data for clarity, we sequen¬ 
tially sample the full set of conditional posterior densities 
p{9\(3) andp(/3|0), where 0 — (v, z, k, t) are the noise pa¬ 
rameters defined in the previous section, and (3 are the 
signal reconstruction parameters. That is, we sample in 
a cycle from the full conditional posterior distribution of 
the signal parameters, given the PSD parameters, and 
the full conditionals of the PSD parameters, given the 
signal parameters. This set-up is called a blocked Gibbs 
sampler. 

To sample the signal parameters, we fix the most recent 
MCMC sample of the PSD parameters. The conditional 
posterior of /3 is 


P(/3|0) = N(/x,S) (13) 

where S = (X'D-^X)"! and fi = EX'D-^y. Here D = 
2ty X diag (/(A)) is the noise covariance matrix, and /(A) 
is the most recent estimate of the PSD. More explicitly, 
at iteration i+1 in the blocked Gibbs sampling algorithm: 

1. Create time-domain noise vector: = y — 

X/3^*). Due to the linearity of the Fourier trans¬ 
form, (3 will be the same no matter if we are in the 
time-domain or frequency-domain. 

2. Sample the PSD parameters using the 

method of Section II C. 

3. Sample the signal parameters using 

Equation ( [T^ (since the PSD in iteration i -f 1 is 
now known). 


E. Non-stationary Noise 

As mentioned in Section II C, stationary noise has a 
constant and finite mean and variance over time, and 
an autocovariance function that depends only on the 
time lag. Non-stationary noise does not meet these re¬ 
quirements, and has a time-varying spectrum. Station- 
arity of a time series can be tested using classical hy¬ 
pothesis tests such as the Augmented Dickey-Fuller test 


|32) . Phillips-Perron unit root test [33], and Kwiatwoski- 
Phillips-Schmidt-Shin (KPSS) test [34] . 

To accommodate non-stationary noise, we adapt an 
idea presented by Rosen et al im, that assumes a time se¬ 
ries can be broken down into locally stationary segments. 
In their paper, they treat the number of stationary com¬ 
ponents of a non-stationary time series as unknown, and 
use RJMCMC [T7] to estimate the segment breaks. 

In a similar fashion, we break a non-stationary time 
series (or GW data stream) into J equal segments. We 
have two requirements for the length of these segments: 
the segment length is large enough for the Whittle ap¬ 
proximation to be valid; and the segments are locally 
stationary according to heuristics or formal stationarity 
hypothesis tests. This approach fits nicely into our cur¬ 
rent MCMC framework. For each segment, we estimate 
the PSD using the nonparametric method introduced in 
Section II C. A benefit of this approach is that change- 
points in the PSD can be detected without using RJM¬ 
CMC. 

The conditional posterior density for all noise model 
parameters 6 is the following product 

.7 

T^{9\f3,y) = Y[Trj{ej\f3,yj), (14) 

i=i 

where TTj{9j\P,yj) is the conditional posterior density of 
the model parameters 9j in the j**' segment given the 
signal parameters f3 and the segment of data fj. 

Note that under this set-up, the PC coefficients (3 do 
not depend on segments j = I, 2,..., J, since we require 
one set of PC coefficients (not J sets) to reconstruct a 
rotating core collapse GW signal. 

To sample /3|0, we use the same approach presented in 
Section II D. The only difference is in the construction 
of the noise covariance matrix. This is constructed as 
D = 27r X diag(/i(A),/ 2 (A),...,/j(A)), where fj{X) is 
the PSD of the noise segment. 


III. RESULTS 


For the following examples, we set L — max{20, 
and use the non-informative prior set-up of Choudhuri 
et al [32]. That is, let Go ~ Uniform[0,1],M = l,ar = 
/3r = 0.001, and 9k = 0.01. We use = 50 for most 
examples, and fcmax = 400 for the example with simu¬ 
lated Advanced LIGO noise to cater for the steep drop 
in the PSD at low frequencies. 

For the examples with a signal embedded in noise, we 
use a Uniform)—oo, oo) prior on the signal reconstruction 
parameters /3, and let d = 25 PCs. For a discussion on 
the optimal choice of PCs, we refer the reader to [21j . 
We also scale the signals to a signal-to-noise ratio (SNR) 
of e = 50. Here SNR (for n even) is defined as 


Q = 


\ 


T./2+1 


2E 

3=0 


|g(A,-)P 

l?(A,)P’ 


(15) 
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where Xj are the positive Fourier frequencies, s(.) is the 
Fourier transformed signal, and e(.) is the Fourier trans¬ 
formed noise series. Note that for the zero and Nyquist 
frequencies, the factor of 2 in Equation (15) becomes a 
factor of 4. 

The value of p = 50 is physically motivated, as we 
would expect to see an SNR of approximately 50 to 170 
for rotating core collapse supernova GWs at a distance 
of 10 kpc. We therefore demonstrate how the method 
works for the lower end of this range. 

The units for frequency in most examples are radians 
per second (rad/s). In the example using simulated Ad¬ 
vanced LIGO noise, we rescale to kilohertz (kHz). PSD 
units are the inverse of the frequency units, and the PSD 
figures are scaled logarithmically. GW strain amplitude 
is unitless. 

For all examples, we run the MGMG sampler for 
150,000 iterations, with a burn-in period of 50,000 and 
thinning factor of 10. This results in 10, 000 samples re¬ 
tained. 


A. Estimating the PSD of Non-Gaussian Coloured 
Noise 


To demonstrate how our model is capable of dealing 
with non-Gaussian transients in the data (or glitches as 
they are sometimes called in GW data analysis), we pro¬ 
vide an illustrative toy example, using coloured noise gen¬ 
erated from a first order autoregressive process, abbrevi¬ 
ated to AR(1). 

A mean-centered AR(1) process {Xt} is defined as 

Xt = pXt-i -l- £(, t = 1, 2,..., n, (16) 


where p is the first order autocorrelation, and e* is a white 
noise process (not necessarily Gaussian), with zero mean 
and constant variance a^. With this formulation, we see 
how the current observation at time t depends on the 
previous observation at time t — 1 through p, as well as 
some white noise Ct, often referred to as innovations or 
the innovation process in time series literature. 

The AR(1) model is a useful example here since it has a 
well-defined theoretical spectral density that we can com¬ 
pare our results against. Assuming \p\ < 1, the AR(1) 
process is stationary and has spectral density 


/(A) = Y 


+ p'^ — 2pcos27rA’ 


Ae(-7r,7r]. (17) 


As seen in Equation the AR(1) process has a 

PSD that is not flat, and the noise in our toy example is 
coloured (non-white), with correlations between frequen¬ 
cies — typical of what we would expect with real Ad¬ 
vanced LIGO noise. As the AR(1) process has a coloured 
spectrum, and white noise has a flat spectrum, we will 
call use the term innovations to refer to the white noise 
component of the model to avoid confusion. 

Eor our example, rather than using Gaussian innova¬ 
tions, which is the most common innovation process used 


in autoregressive models, we use Student-t innovations 
with 1 / = 3 degrees of freedom. The choice of ^ = 3 de¬ 
grees of freedom is the smallest integer that results in a 
Student-t model with finite variance (a requirement for 
the innovation process {et} of an AR(1) model). This 
model has wider tails than that of the Gaussian model 
(and in fact the widest tails possible whilst maintaining 
the finite variance requirement), meaning we can expect 
extreme values in the tails of the distribution to occur 
more often. This will be our proxy for glitches. 

We refer the reader to a relevant time series analysis 
textbook such as Brockwell and Davis m for further 
information on AR(I) processes. 

For this example, we generate a length n = 2^^ AR(1) 
process with p = —0.9 and Student-t innovations with 
y = 3 degrees of freedom. Let this (stationary) time 
series have sampling interval Aj = 1/2^^ (the same as 
Advanced LIGO). The data set-up can be seen in Fig¬ 
ure [TJ 
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FIG. 1: Simulated stationary AR(1) process with first- 
order autocorrelation p = —0.9, and Student-t innova¬ 
tions {v = 3 degrees of freedom). 
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We can see the effect of using v = 3 degrees of freedom 
in Figure Notice how there are transient high am¬ 
plitude non-Gaussian events. These are a result of the 
wide-tailed nature of the Student-t density. It would be 
very unlikely to see these high amplitude events if the 
innovation process was Gaussian. 

We now run the noise-only algorithm of Section II G to 
demonstrate that we can accurately characterize a non- 
Gaussian noise PSD. 
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FIG. 2: Estimated log PSD of the AR(1) time series in 
Figure 90% credible region (shaded pink) and poste¬ 
rior median log PSD (dashed blue), superimposed with 
the true log PSD (solid black). 


The estimated point-wise posterior median log PSD 
in Figure is very close to the true log PSD, and the 
90% credible region generally contains the true log PSD. 
This demonstrates that even if there are non-Gaussian 
transients in the data (which is certainly the case for real 
LIGO data), this PSD estimation method performs well. 
This is however not surprising as the Whittle likelihood 
gives a good approximation to Gaussian and some non- 
Gaussian likelihoods |29) . 


B. Extracting a Rotating Core Collapse Signal in 
Stationary Coloured Noise 


In this example, we aim to extract a rotating GW sig¬ 
nal from noisy data using the blocked Gibbs sampler de¬ 
scribed in Section II D. We embed the A1012.25 rotating 
core collapse GW signal from the Abdikamalov et al [25] 
test catalogue (i.e., a signal not part of the base cata¬ 
logue used to create the PC basis functions) in AR(1) 
noise with p = 0.9. For clarity, let this process have a 
Gaussian white noise innovation process with cr^ = 1. 
Let the time series be length n = 2^^, which corresponds 
to 1/4 s of data at the Advanced LIGO sampling rate. 
The signal is scaled to have a SNR of = 50. The recon¬ 
structed signal can be seen in Figure]^ 

The rotating core collapse GW signal in Figure is 
reconstructed particularly well during the collapse and 
bounce phases (the first few peaks/troughs). The post¬ 
bounce ring-down oscillations are usually poorly esti¬ 
mated due to stochastic dynamics mi US], but are ac¬ 
ceptable for this particular example. 



FIG. 3: Reconstructed rotating core collapse GW signal. 
90% credible region (shaded pink) and posterior median 
signal (dashed blue), superimposed with true A1012.25 
GW signal from the Abdikamalov et al [25] test catalogue 
(solid black). 


In this example, the signal parameters were simulta¬ 
neously estimated with the noise PSD using the blocked 
Gibbs sampler described in Section II D. We now com¬ 
pare the performance of the estimated noise PSD with 
and without a signal present. That is, we compare the 
noise PSD estimates between the algorithms presented in 
Section II G (noise-only model) and Section II D (signal- 
plus-noise model), using the same noise series for both 
models. 



0 12 3 
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—^True 
— Noise Only 
—^ Signal + Noise 


FIG. 4: Gomparison of the noise PSD estimates for the 
noise-only and signal-plus-noise models. Plotted are the 
point-wise posterior median log noise PSDs with and 
without a GW signal. The true log PSD of the AR(1) 
noise series is overlaid. 


We can see in Figure that both models (noise-only 
and signal-plus-noise) perform similarly when estimat¬ 
ing the PSD of coloured Gaussian noise. The poste¬ 
rior median log PSDs are approximately equal, and are 
very close to the true log PSD of an AR(1) process with 
p = 0.9. This is a useful robustness check, and demon¬ 
strates that we are successfully decoupling the signal from 
the noise. 
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C. Comparing Input and Reconstruction 
Parameters 

As there is no analytic form linking the astrophysical 
parameters of a rotating core collapse stellar event to 
its GW signal, we can only approximate the GW signal 
using statistical methods. We do this using PCR, but 
this means that there are no true input parameters that 
we can compare with the estimated signal reconstruction 
parameters. However, if one were to create a fictitious 
signal as a known linear combination of PCs, we could 
demonstrate the algorithm’s performance in estimating 
the signal reconstruction parameters. 

Consider the following fictitious rotating core collapse 
GW signal 

d 

i=l 

where y is the length n signal, (xi,X 2 , ... ,Xd) are the 
d PC basis vectors of length n, and (ai, 02 , ■ ■ ■, cid) are 
the “true” weights, or PC coefficients. To randomize 
the weights, we randomly sample each from the standard 
normal distribution. 

In this example, we embed the fictitious length n = 2^^ 
GW signal in AR(1) noise with p = 0.9 and Gaussian 
innovations with erg = 1. We set d = 10. We rescale the 
signal to have SNR g = 50, and after the algorithm has 
run, we rescale our estimated PC coefficients back to the 
original level for comparison. 


2 i 



0-1 
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1 23456789 10 

PC 

FIG. 5: Posterior median PC coefficients (blue square) 
and “true” PC coefficients (orange triangle) for the 10 
PCs of a fictitious GW signal embedded in AR(1) noise. 
The error bands are the 95% credible intervals. 

It can be seen in Figure that the “true” PC coef¬ 
ficients are generally contained within the 95% credible 
intervals, demonstrating that the algorithm can estimate 
a signal’s input parameters well in the presence of station¬ 
ary coloured noise. Notice also that the credible inter¬ 
vals widen as the principal component number increases. 
This is due to the fact that higher numbered PCs explain 
lower amounts of variation in the waveform catalogue, re¬ 
sulting in lower amplitude waves. We would therefore be 
more uncertain about these PCs embedded in noise. 


D. Extracting a Rotating Core Collapse Signal in 
Time-Varying Coloured Noise 

Non-stationary noise has a time-varying spectrum. To 
illustrate how our method can handle non-stationarities 
(or change-points in the spectral structure), we simulate 
a noise series with J = 2 locally stationary components 
of equal length ni = n 2 = 2^^. The first segment of 
the noise series is generated from an AR(1) process with 
p — 0.5. The second noise segment comes from an AR(1) 
process with p = —0.75. Both segments use a Gaussian 
innovation process with variance = \ for clarity. We 
embed part of the A108.25 waveform from the Abdika- 
malov et al catalogue |25j . This waveform is in the test 
set, not included in the construction of PC basis func¬ 
tions. The data set-up can be seen in Figure]^ 



FIG. 6: Snapshot of the signal superimposed on signal 
plus noise. The noise series has length n = ni + n 2 = 2^^ 
and is segmented into two equal parts. The first half 
of the noise is generated from an AR(1) with p = 0.5, 
and the second half is generated from an AR(1) with 
p = —0.75. Both segments use a Gaussian innovation 
process with variance = 1. The A108.25 rotating 
core collapse GW signal from the Abdikamalov et al test 
catalogue [53] is embedded in this noise with a SNR of 
g = 50. 


The aim here is to simultaneously estimate both noise 
PSDs, as well as reconstructing the embedded GW signal 
using the method described in Section II E. Here we are 
assuming the change-point between the two noise series 
is known, though we will demonstrate in the next section 
that our method can locate unknown change-points. 

Notice the difference between the first half of the noise 
series compared with the second half. Each segment 
has a different dependence structure, and are therefore 
coloured differently in the frequency-domain. This re¬ 
sults in a different time-domain morphology. Estimates 
of the noise PSDs can be seen in Figures and 
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FIG. 7: Spectral density estimate of the first noise seg¬ 
ment (p = 0.5) from Figure 90% credible region 
(shaded pink), posterior median log PSD (dashed blue), 
and theoretical log PSD (solid black). 



FIG. 9: Reconstructed rotating core collapse GW. 90% 
credible region (shaded pink) and posterior median sig¬ 
nal (dashed blue), superimposed with true A108.25 GW 
signal from the Abdikamalov et al [25] test catalogue 
(solid black). The first half of the signal was embedded 
in AR(1) noise with p = 0.5, and the second half had 
AR(1) noise with p = —0.75. Both noise segments had 
Gaussian white noise with = 1. 



FIG. 8: Spectral density estimate of the second noise 
segment (p = —0.75) from Figure]^ 90% credible region 
(shaded pink), posterior median log PSD (dashed blue), 
and theoretical log PSD (solid black). 


Figures]^ andshow the estimated log PSDs for the 
two noise segments. The point-wise posterior median log 
PSDs are close to the true log PSDs, and the 90% credi¬ 
ble regions for both segments mostly contain the true log 
PSDs, but veer slightly off towards the low frequencies. 
Due to posterior consistency of the PSD, these estimates 
will only get better as the sample size increases. Slight 
imperfections in the PSD estimates may not be such a 
problem if the embedded GW signal is extracted well, 
which happens to be the case in this example. The ex¬ 
tracted signal can be seen in Figure 


The 90% credible region for the reconstructed GW sig¬ 
nal in Figure [^generally contains the true signal, and has 
performed particularly well during collapse and bounce. 
Again, the post-bounce ring-down oscillations usually 
have the poorest reconstruction through the time series, 
but has performed remarkably well in this example, re¬ 
gardless of the slight imperfections of the PSD estimates. 


E. Detecting a Spectral Change-Point 

Consider a change-point problem similar to that of the 
previous section, where a time series exhibits a change 
in its spectral structure somewhere in the series. A 
valuable consequence of the algorithm presented in Sec¬ 
tion II E is its ability to detect change-points regardless 
of whether the change-point occurs within a segment or 
on the boundary. For the following examples, let n = 2^^ 
and break this into J = 32 equal length segments. For 
clarity, assume the time series does not contain an em¬ 
bedded GW signal. 

First consider the case where the change-point occurs 
on the boundary of two noise series. Let ni = n 2 = 2^^ 
be the lengths of each noise series, and let the first half of 
the time series be generated from an AR(1) with p = 0.5, 
and the second half from an AR(1) with p = —0.75. Both 
AR(1) processes have additive Gaussian innovations with 
CTj = 1. In this example, the change-point occurs ex¬ 
actly halfway through the series. Figure [T^ shows a time- 
frequency map of the estimated log PSDs for each seg¬ 
ment. 
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FIG. 10: Time-frequency map showing the estimated 
posterior median log PSDs for 32 segments of 1/4 s of 
AR(1) noise. The change-point in spectral structure oc¬ 
curs exactly halfway through the series. 

It is obvious that a change-point occurs halfway 
through Figure as there is a sheer change in the spec¬ 
tral structure at this point between segments 16 and 17. 
The first half of the time-frequency map exhibits stronger 
low-frequency behaviour, whereas the second half has 
more power in the higher frequencies. 

Now consider the case where the change-point occurs 
during a segment rather than on the boundary. Here, 
let each segment have the same set-up as before, but 
instead set rii = 2^^ — 2® and 712 = 2^^ -I- 2® such that 
a change-point occurs halfway through segment 16. A 
time-frequency map of the estimated log PSDs can be 
seen in Figure [TTj 



FIG. 11: Time-frequency map showing the estimated 
posterior median log PSDs for 32 segments of 1/4 s of 
AR(1) noise. The change-point in spectral structure oc¬ 
curs in the middle of segment 16 just before the halfway 
point. 


Figure demonstrates that there is a noticeable 
change-point roughly halfway through the series. There 
is a smoother transition from one PSD structure to the 
other than in the previous example since the true change- 
point occurs in the middle of a segment rather than on 
the boundary. 


These examples demonstrate that we can detect poten¬ 
tially unknown change-points in a time series. It is im¬ 
portant to note that if more segments are used, the time 
duration within each segment becomes smaller, and our 
accuracy in detecting the change-point increases. That is, 
the time at which the change-point occurs becomes more 
resolved if the segment durations are smaller. However, 
one must also ensure that the segment durations are long 
enough for the Whittle approximation to be valid. 


F. Simulated Advanced LIGO Noise 

In this example, we simulate Advanced LIGO noise and 
embed the AlOlO.25 rotating core collapse GW signal 
from the Abdikamalov et al [5S] catalogue in it, scaled to 
an SNR oi g = 50. We assume a one detector set-up, with 
linearly polarized GW signal (zero cross polarization). 
The Advanced LIGO sampling rate is = 2^'^ Hz, with 
a Nyquist frequency of r* = 2^® Hz. Let n = 2^^, which 
corresponds to quarter of a second of data. 

The simulated noise is Gaussian, and coloured by 
the Advanced LIGO design sensitivity PSD. Generating 
this noise blindly results in a perfect matching of the 
end-points and their derivatives, due to the simplified 
frequency-domain model. This is not realistic, since real 
data will often not have matching end-points. In order 
to make the noise generation more realistic, we internally 
generated a longer frequency-domain series (ten times 
longer), inverse discrete Fourier transformed it, and re¬ 
turned a fraction of it with a random starting point. This 
is referred to as “padding” the data. 
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FIG. 12: Estimated log PSD for simulated Advanced 
LIGO noise. 90% credible region (shaded pink) and 
posterior median (dashed blue) overlaid with log peri- 
odogram (solid grey). 

Figure shows the estimated log PSD and the 90% 
credible region, overlaid with the log periodogram. The 
method performs remarkably well, particularly at higher 
frequencies. Even though we will not be able to resolve 
frequencies below ^ 10-20 Hz at the Advanced LIGO 
design sensitivity, it is still interesting to see how this 
method performs at lower frequencies. Here, the low 
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frequency estimates are slightly off, but not by much. 
We believe this to be due to two factors: 1/4 s of simu¬ 
lated Advanced LIGO noise is actually a non-stationary 
series, and we did not adjust for non-stationarities (sim¬ 
ulated Advanced LIGO data is not stationary for more 
than 1/16 s based on the Augmented Dickey-Fuller test, 
Phillips-Perron unit root test, and KPSS test); and the 
Bernstein polynomial basis functions are notoriously slow 
to converge to a true function [33 ES]- These factors con¬ 
sidered, the method still provides a reasonable approxi¬ 
mation. 
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FIG. 13: Reconstructed rotating core collapse GW signal. 
90% credible interval (shaded pink) and posterior mean 
(dashed blue) overlaid with true AlOlO.25 signal (solid 
black) from Abdikamalov et al [25] test catalogue. Signal 
is scaled to a SNR oi g = 50. 


The resultant reconstructed GW signal can be seen in 
Figure [T^ The estimated signal here is very close to the 
true signal during the the collapse and bounce phases, 
as well as during the ring-down oscillations. The 90% 
credible region contains most of the true GW signal. 

We chose d = 25 PCs to reconstruct a rotating core 
collapse GW signal, but this could be too many or too 
few basis functions. Model selection methods similar to 
m were not investigated in the current study, and even 
though Figuresj^l^ and [TS] demonstrated good estimates 
during all phases (including ring-down), there is a de¬ 
mand for improved reconstruction methods. 

We then accommodated for non-stationarities in de¬ 
tector noise by breaking the series into smaller and lo¬ 
cally stationary components, and looked at the resulting 
time-varying spectrum. This can be seen in Figure |14| 
Rather than choosing J = 32 as in Section III E, non- 
stationarities in the Advanced LIGO noise become more 
apparent if we slice the noise series into fewer segments, 
each with longer duration. Instead, consider splitting 
the data into J = 8 equal length segments {rij = 2®). 
Here, the Whittle approximation is valid, and the seg¬ 
ments look locally stationary. 
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FIG. 14: Time-frequency map of the estimated time- 
varying noise spectrum for 8 segments of 1/4 s simulated 
Advanced LIGO noise. The posterior median log PSDs 
for each noise segment are used. 

Figure [14] illustrates that the Advanced LIGO PSD is 
changing over time. Notice that lower frequencies are 
gaining more power over time. Assuming that each seg¬ 
ment is locally stationary (which should be the case since 
the duration of each segment is less than 1/16 s), it is im¬ 
portant to accommodate for the changing nature of the 
PSD since the Ghoudhuri et al ^2] PSD estimation tech¬ 
nique is based on the theory of stationary processes. If we 
did not adjust for non-stationarities, estimates of astro- 
physically meaningful parameters could become biased. 


IV. DISCUSSION AND OUTLOOK 

This study was motivated by the need for an improved 
model for PSD estimation in GW data analysis. The 
assumptions of the standard GW noise model are too 
restrictive for Advanced LIGO data. GW data is sub¬ 
ject to high amplitude non-Gaussian transients, meaning 
that the Gaussian assumption is not valid. If the noise 
model is incorrectly specified, we could make misleading 
inferences. The stationarity assumption is also not valid, 
as simulated Advanced LIGO noise is not stationary for 
much longer than 1/16 s according to classical statistical 
hypothesis tests. Using off-source data to estimate the 
PSD is problematic since the PSD will naturally drift 
over time, and is not necessarily the same as on the GW 
source. 

The primary goal of this study was to develop a sta¬ 
tistical model that allows for on-source estimation of the 
PSD, while making no assumptions about the underly¬ 
ing noise distribution. We also wanted a method capable 
of accounting for non-stationary noise. Although we re¬ 
stricted our attention to GWs from rotating core collapse 
stellar events in this paper, our approach is perfectly valid 
for any GW signal embedded in noise. 

A secondary goal of this paper was to highlight to 
the GW community the rich and active area of Bayesian 
nonparametrics (and semiparametrics). We believe this 
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framework will be a very powerful toolbox going for¬ 
ward, particularly in the analysis of GW bursts, since 
accurate parametric models for these types of signals are 
limited. Further, our future research efforts regarding 
rotating core collapse events involves Bayesian nonpara- 
metric regression models to construct GWs from their 
initial conditions. Regularization methods, such as the 
Bayesian LASSO are also being considered. 

In this paper, we have assumed linearly polarized GWs 
to be detected by one interferometer. A relatively simple 
extension of this work is to include a network of detec¬ 
tors, as well as GWs with non-zero cross polarization. 
Another extension would be to assume an unknown sig¬ 
nal arrival time, as done in [501 [H]. These extensions can 
be expected in the second generation of the algorithm. 

The noise in our model was assumed to come from all 
sources, including detector noise, environmental noise, 
and statistical noise from parametric modelling of the 
signal. The statistical noise is the residual difference be¬ 
tween the true and fitted signals. An important factor 
to consider was whether statistical noise artificially dom¬ 
inated the noise. We do not believe this to be a domi¬ 
nating contributor to the overall noise. 

Since the “theoretical” PSD of Advanced LIGO at its 
design sensitivity has a very steep decrease at low fre¬ 
quencies until it reaches a minimum at roughly 230 Hz, it 
is difficult for our algorithm to perfectly characterize the 
shape at low frequencies without increasing computation 
significantly. This is due to the well-known slow conver¬ 
gence of Bernstein basis functions to a true curve. That 
is, many Bernstein polynomials (on order k = 1000) are 
required to accurately characterize the PSD of Advanced 
LIGO. Compare this to more well-behaved noise sources, 
such as those from autoregressive processes, which re¬ 
quire k < 50. We are currently developing a second 
generation of this algorithm, using a mixture of B-spline 
densities (normalized to the unit interval), rather than 
Beta densities. B-splines have much faster convergence 
rates than Bernstein polynomials [351136] . An additional 
benefit of changing the basis functions to B-splines is 
that, like BayesLine |7], we will be able to account for 
spectral lines by peak-loading knots at a priori known 
frequencies that these occur at. We have left estimation 
of spectral lines out of the scope of this paper, but believe 
that a change of basis functions from Bernstein polyno¬ 
mials to normalized B-spline densities could work well. 
Another interesting approach would be to model spectral 
lines with informative priors using a similar approach to 
Macaro [38] . 

We used non-informative priors in this analysis. It may 
be possible to translate the known shape of the Advanced 
LIGO design sensitivity PSD into a prior. This may also 
aid in improving PSD estimates at lower frequencies. 

We discussed a simplified method for estimating the 
time-varying PSD of non-stationary noise. Our approach 
assumed that a time series is split into equal length seg¬ 
ments, and at known times. We demonstrated that it is 
possible to identify change-points in a time series and its 


spectrum using this method, and that there is no need 
to estimate the locations of the segment splits. Thus, a 
fixed grid of known segment placements suffices, and no 
RJMCMG is required. RJMGMG would have slowed the 
algorithm down significantly, and created an entire new 
set of complications. 

There is much work to be done on PSD estimation. As 
the Advanced LIGO and Advanced Virgo interferometers 
swiftly approach design sensitivity, it is important that 
we continue to focus not only on parameter estimation 
techniques, but also on modelling detector noise. PSD es¬ 
timation is as important as parameter estimation, since 
we want to make honest statements about our observa¬ 
tions based on rigorous statistical theory. It is hoped that 
in the near future, we can converge on a PSD estimation 
method that is less strict than the standard noise model, 
works well on real detector data, and is based on good 
statistical theory. We believe that the methods presented 
in this paper are definitely a step in the right direction. 

Appendix A: Bernstein Polynomials and the Beta 
Density 

To define the Bernstein polynomial, we first need to 
discuss the Bernstein basis polynomials. There are k + 
1 Bernstein basis polynomials of degree fc, having the 
following form 

^3,k{x) = x)^~\ j = 0,1,... ,fc. (Al) 

A Bernstein polynomial is the following linear combi¬ 
nation of Bernstein basis polynomials 

k 

Bk{x) = '^Pjbj^kix), (A2) 

3=0 

where fij are called the Bernstein coefficients. 

As mentioned in Section II G, the Bernstein polyno¬ 
mial prior is a finite mixture of Beta probability densi¬ 
ties. We use the following parameterization for the Beta 
probability density function 

f{x\a,f3) = xf-\ (A3) 

(X x‘^-\l - xf-\ (A4) 

where x G (0,1), the shape parameters are positive real 
numbers (i.e., a > 0 and /3 > 0), and r(.) is the gamma 
function defined as the following improper integral 

nOO 

r(u) = / exp(—a;)da;. (A5) 

Jo 

Appendix B: The Dirichlet Distribution, Dirichlet 
Process, and Stick-breaking Construction 

The Dirichlet distribution is a multivariate generaliza¬ 
tion of the Beta distribution (defined in Appendix A) 
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with a probability density function defined on the K- Appendix C: Demonstration of Posterior 

dimensional simplex Consistency 


Ak = 



K 


> 0 , = 1 


(Bl) 


The probability density function of the Dirichlet dis¬ 
tribution is defined as 



where ai > = 1,..., K. 

The Dirichlet process is an infinite-dimensional gener¬ 
alization of the Dirichlet distribution. It is a probabil¬ 
ity distribution on the space of probability distributions, 
and is often used in Bayesian inference as a prior for infi¬ 
nite mixture models. One of the many representations of 
the Dirichlet process is Sethuraman’s stick-breaking con¬ 
struction [HEI]. This is useful for implementing MCMC 
sampling algorithms. 

Let G ~ DP(M, Go), where Go is the center measure, 
and M is the precision parameter (larger M implies a 
more precise prior). The Sethuraman representation is 


OO 


g = J2p^^z„ 

2^1 

(B3) 

II 

1 

(B4) 

V=i / 


Zi Go, 

(B5) 

U -- Beta(l,M). 

(B6) 


Consider a stick of unit length. The weights Pi asso¬ 
ciated with points Zi can be thought of as breaking this 
stick randomly into infinite segments. Break the stick at 
location Vi ~ Beta(l,M), assigning the mass Vi to the 
random point Zi ^ Go- Break the remaining length of 
the stick 1 — Vi by the proportion V 2 ^ Beta(l,M), as¬ 
signing the mass (1 —Vi)V 2 to the random point Z 2 ^ Gq. 
At the ***' step, break the remaining length of the stick 
n:cl(i — Vj) by the proportion Vi ~ Beta(l,M), as¬ 
signing the mass {ir^ (1 — Vj)j Vi to the random point 
Zi ~ Go- This process is repeated infinitely many times. 


It was proved in |22] that under very general condi¬ 
tions on the prior, the PSD estimation method used in 
this paper has the property of posterior consistency. We 
provide an illustrative example of this in Figure [T^ 



Sample 

size 



FIG. 15: Illustration of posterior consistency. The true 
log PSD (dashed black) overlaid with point-wise posterior 
median log PSDs of varying sample sizes. 

We generated AR(1) processes (with p = 0.9 and Gaus¬ 
sian white noise) of varying sample sizes, and compared 
their performance. It can be seen in Figure that as 
the sample size of the time series increases, the point- 
wise posterior median log PSD gets closer to the true log 
PSD, thus demonstrating posterior consistency. 
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